Data Cleaning and Matching of Institutions in Bibliographic Databases

نویسندگان

  • Jeffrey Fisher
  • Qing Wang
  • Paul Wong
  • Peter Christen
  • Paul Kennedy
  • Lin Liu
  • Kok-Leong Ong
  • Andrew Stranieri
چکیده

Bibliographic databases are very important for a variety of tasks for governments, academic institutions and businesses. These include assessing research output of institutions, performance evaluation of academics and compiling university rankings. However, incorrect or incomplete data in such databases can compromise any analysis and lead to poor decisions and financial loss. In this paper we detail our experience with an entity resolution project on Australian institution data using the SCOPUS bibliographic database. The goal of the project was to improve the entity resolution of institution data in SCOPUS so it could be used more effectively in other applications. We detail the methodology including a novel approach for extracting correct institution names from the values of one of the attributes. Along with the results from the project we present our insights into the specific characteristics and difficulties of the Australian institution data, and some techniques that were effective in addressing these. Finally, we present our conclusions and describe other situations where our experience and techniques could be applied.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entity Resolution of Institutions in Bibliographic Databases

Acknowledgements Many people have assisted me in carrying out this project. Firstly I would like to thank my academic supervisors, Associate Professor Peter Christen and Dr. Qing Wang for their ideas, support, encouragement and feedback. I would also like to thank Dr. Paul Wong from the ANU Research Office for providing me with a place to work and helpful advice on the project itself and the SC...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Web-based Affiliation Matching

Authors of scholarly publications state their affiliation in various forms. This kind of heterogeneity makes bibliographic analysis tasks on institutions impossible unless a comprehensive cleaning and consolidation of affiliation data is performed. We investigate automatic approaches to consolidate affiliation data to reduce manual work and support scalability of affiliation analysis. In partic...

متن کامل

Comparison of Bibliographic Databases in Retrieving Information on Telemedicine

Background & Aims: Some of the main questions which can be of importance for those researchers who intend to perform a systematic review in a field of science are: ‘What databases should I use for my review?’; ‘Do all these databases have the same value?’; and ‘Which sourcesretrieved the highest of relevant references?’. The main aim of this work was the identification of the best database for ...

متن کامل

الگوی ملزومات کارکردی پیشینه‌های کتابشناختی: شیوه‌ای نوین در تنظیم عناصر کتابشناختی

Functional Requirements for Bibliographic Records (FRBR) is a conceptual model for the arrangement of bibliographic records in catalogs and databases which was proposed in IFLA 1997, following a plan for revising Anglo-American Cataloging Rules (AACR). This model is inclined to be separated from the other cataloging rules, and uses a new structure for storing and displaying bibliographic record...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013